Unsupervised classification for tiling arrays: ChIP-chip and transcriptome.
نویسندگان
چکیده
Tiling arrays make possible a large-scale exploration of the genome thanks to probes which cover the whole genome with very high density, up to 2,000,000 probes. Biological questions usually addressed are either the expression difference between two conditions or the detection of transcribed regions. In this work, we propose to consider both questions simultaneously as an unsupervised classification problem by modeling the joint distribution of the two conditions. In contrast to previous methods, we account for all available information on the probes as well as biological knowledge such as annotation and spatial dependence between probes. Since probes are not biologically relevant units, we propose a classification rule for non-connected regions covered by several probes. Applications to transcriptomic and ChIP-chip data of Arabidopsis thaliana obtained with a NimbleGen tiling array highlight the importance of a precise modeling and of the region classification. The "TAHMMAnnot" package is implemented in R and C and is freely available from CRAN.
منابع مشابه
A hidden Markov model for analyzing ChIP-chip experiments on genome tiling arrays and its application to p53 binding sequences
MOTIVATION Transcription factors (TFs) regulate gene expression by recognizing and binding to specific regulatory regions on the genome, which in higher eukaryotes can occur far away from the regulated genes. Recently, Affymetrix developed the high-density oligonucleotide arrays that tile all the non-repetitive sequences of the human genome at 35 bp resolution. This new array platform allows fo...
متن کاملAn integrated software system for analyzing ChIP - chip and ChIP - seq data ( Supplementary Information )
ChIP-chip and ChIP-seq are powerful technologies to study transcriptional regulation in complex genomes, however, mining information from the huge datasets generated by these highthroughput technologies remains to be a non-trivial task. To analyze a ChIP-chip experiment, one usually starts with data exploration and then goes through normalization, binding region detection, adding gene annotatio...
متن کاملModel-based analysis of tiling-arrays for ChIP-chip.
We propose a fast and powerful analysis algorithm, titled Model-based Analysis of Tiling-arrays (MAT), to reliably detect regions enriched by transcription factor chromatin immunoprecipitation (ChIP) on Affymetrix tiling arrays (ChIP-chip). MAT models the baseline probe behavior by considering probe sequence and copy number on each array. It standardizes the probe value through the probe model,...
متن کاملMotif discovery and transcription factor binding sites before and after the next-generation sequencing era
Motif discovery has been one of the most widely studied problems in bioinformatics ever since genomic and protein sequences have been available. In particular, its application to the de novo prediction of putative over-represented transcription factor binding sites in nucleotide sequences has been, and still is, one of the most challenging flavors of the problem. Recently, novel experimental te...
متن کاملDouble error shrinkage method for identifying protein binding sites observed by tiling arrays with limited replication
MOTIVATION ChIP-chip has been widely used for various genome-wide biological investigations. Given the small number of replicates (typically two to three) per biological sample, methods of analysis that control the variance are desirable but in short supply. We propose a double error shrinkage (DES) method by using moving average statistics based on local-pooled error estimates which effectivel...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Statistical applications in genetics and molecular biology
دوره 10 1 شماره
صفحات -
تاریخ انتشار 2011